9 research outputs found
SurgMAE: Masked Autoencoders for Long Surgical Video Analysis
There has been a growing interest in using deep learning models for
processing long surgical videos, in order to automatically detect
clinical/operational activities and extract metrics that can enable workflow
efficiency tools and applications. However, training such models require vast
amounts of labeled data which is costly and not scalable. Recently,
self-supervised learning has been explored in computer vision community to
reduce the burden of the annotation cost. Masked autoencoders (MAE) got the
attention in self-supervised paradigm for Vision Transformers (ViTs) by
predicting the randomly masked regions given the visible patches of an image or
a video clip, and have shown superior performance on benchmark datasets.
However, the application of MAE in surgical data remains unexplored. In this
paper, we first investigate whether MAE can learn transferrable representations
in surgical video domain. We propose SurgMAE, which is a novel architecture
with a masking strategy based on sampling high spatio-temporal tokens for MAE.
We provide an empirical study of SurgMAE on two large scale long surgical video
datasets, and find that our method outperforms several baselines in low data
regime. We conduct extensive ablation studies to show the efficacy of our
approach and also demonstrate it's superior performance on UCF-101 to prove
it's generalizability in non-surgical datasets as well
Tracking and Mapping in Medical Computer Vision: A Review
As computer vision algorithms are becoming more capable, their applications
in clinical systems will become more pervasive. These applications include
diagnostics such as colonoscopy and bronchoscopy, guiding biopsies and
minimally invasive interventions and surgery, automating instrument motion and
providing image guidance using pre-operative scans. Many of these applications
depend on the specific visual nature of medical scenes and require designing
and applying algorithms to perform in this environment.
In this review, we provide an update to the field of camera-based tracking
and scene mapping in surgery and diagnostics in medical computer vision. We
begin with describing our review process, which results in a final list of 515
papers that we cover. We then give a high-level summary of the state of the art
and provide relevant background for those who need tracking and mapping for
their clinical applications. We then review datasets provided in the field and
the clinical needs therein. Then, we delve in depth into the algorithmic side,
and summarize recent developments, which should be especially useful for
algorithm designers and to those looking to understand the capability of
off-the-shelf methods. We focus on algorithms for deformable environments while
also reviewing the essential building blocks in rigid tracking and mapping
since there is a large amount of crossover in methods. Finally, we discuss the
current state of the tracking and mapping methods along with needs for future
algorithms, needs for quantification, and the viability of clinical
applications in the field. We conclude that new methods need to be designed or
combined to support clinical applications in deformable environments, and more
focus needs to be put into collecting datasets for training and evaluation.Comment: 31 pages, 17 figure
Image and haptic guidance for robot-assisted laparoscopic surgery
Surgical removal of the prostate gland using the da Vinci surgical robot is the state of the art treatment option for organ confined prostate cancer. The da Vinci system provides excellent 3D visualization of the surgical site and improved dexterity, but it lacks haptic force feedback and subsurface tissue visualization. The overall objective of the work done in this thesis is to augment the existing visualization tools of the da Vinci with ones that can identify the prostate boundary, critical structures, and cancerous tissue so that prostate resection can be carried out with minimal damage to the adjacent critical
structures, and therefore, with minimal complications. Towards this objective we designed and implemented a real-time image guidance system based on a robotic transrectal ultrasound (R-TRUS) platform that works in tandem with the da Vinci surgical system and tracks its surgical instruments. In addition to ultrasound as an intrinsic imaging modality, the system was first used to bring pre-operative magnetic resonance imaging (MRI) to the operating room by registering the pre-operative MRI to the intraoperative ultrasound and displaying the MRI image at the correct physical location based on the real-time ultrasound image. Second, a method of using the R-TRUS system for tissue palpation is proposed by expanding it to be used in conjunction with a real-time strain imaging technique. Third, another system based on the R-TRUS is described for detecting dominant prostate tumors, based on a combination of features extracted from a novel
multi-parametric quantitative ultrasound elastography technique. We tested our systems in an animal study followed by human patient studies involving n = 49 patients undergoing da Vinci prostatectomy. The clinical studies were conducted to evaluate the feasibility of using these systems in real human procedures, and also to improve and optimize our imaging systems using patient data.
Finally, a novel force feedback control framework is presented as a solution to the lack of haptic feedback in the current clinically used surgical
robots. The framework has been implemented on the da Vinci surgical system using the da Vinci Research Kit controllers and its performance has
been evaluated by conducting user studies.Applied Science, Faculty ofElectrical and Computer Engineering, Department ofGraduat
Adaptation of Surgical Activity Recognition Models Across Operating Rooms
Automatic surgical activity recognition enables more intelligent surgical
devices and a more efficient workflow. Integration of such technology in new
operating rooms has the potential to improve care delivery to patients and
decrease costs. Recent works have achieved a promising performance on surgical
activity recognition; however, the lack of generalizability of these models is
one of the critical barriers to the wide-scale adoption of this technology. In
this work, we study the generalizability of surgical activity recognition
models across operating rooms. We propose a new domain adaptation method to
improve the performance of the surgical activity recognition model in a new
operating room for which we only have unlabeled videos. Our approach generates
pseudo labels for unlabeled video clips that it is confident about and trains
the model on the augmented version of the clips. We extend our method to a
semi-supervised domain adaptation setting where a small portion of the target
domain is also labeled. In our experiments, our proposed method consistently
outperforms the baselines on a dataset of more than 480 long surgical videos
collected from two operating rooms.Comment: MICCAI 202
SENDD: Sparse Efficient Neural Depth and Deformation for Tissue Tracking
Deformable tracking and real-time estimation of 3D tissue motion is essential
to enable automation and image guidance applications in robotically assisted
surgery. Our model, Sparse Efficient Neural Depth and Deformation (SENDD),
extends prior 2D tracking work to estimate flow in 3D space. SENDD introduces
novel contributions of learned detection, and sparse per-point depth and 3D
flow estimation, all with less than half a million parameters. SENDD does this
by using graph neural networks of sparse keypoint matches to estimate both
depth and 3D flow. We quantify and benchmark SENDD on a comprehensively
labelled tissue dataset, and compare it to an equivalent 2D flow model. SENDD
performs comparably while enabling applications that 2D flow cannot. SENDD can
track points and estimate depth at 10fps on an NVIDIA RTX 4000 for 1280 tracked
(query) points and its cost scales linearly with an increasing/decreasing
number of points. SENDD enables multiple downstream applications that require
3D motion estimation.Comment: 12 pages, 4 figure
DETC2009/MESA-86451 DESIGN AND RECONFIGURATION ALGORITHM OF HEXBOT: A MODULAR SELF-RECONFIGURABLE ROBOTIC SYSTEM
ABSTRACT This paper primarily addresses the design and implementation of a planar hexagonal Modular Self-Reconfigurable Robotic System (MSRRS) along with the construction of its reconfiguration path planner and control algorithm. A universal module is carefully designed to be in line with the common goals of MSRRS including homogeneity, cost-effectiveness, fast actuation and quick and strong connections. Although the implemented working prototype is both large and restricted to a planar geometry, it is designed such that its hardware and software can be scaled up in the number of units and down in unit size; similarly, the platform has the potential to be extended for 3D applications. The software infrastructure of this platform is designed in a way that different hierarchies for distributed control and communication can be implemented. The algorithmic design is based on a hierarchical multilayer approach, where upper layers decompose the problem into sub-problems solvable by lower layers. An optimal reconfiguration path planner is developed to minimize the number of module movements during the reconfiguration while enforcing collision avoidance and connectivity constraints in addition to taking into account the kinematic model of the platform. The core of the algorithm relies on a heuristic function and a Markov Decision Process (MDP) optimization to generate a near-optimal reconfiguration path planner and a control algorithm for HexBot shown i